宫颈癌是女性中一种非常常见和致命的癌症类型。细胞病理学图像通常用于筛选这种癌症。鉴于在手动筛查期间可能发生许多错误,已经开发了一种基于深度学习的计算机辅助诊断系统。深度学习方法需要输入图像的固定维度,但临床医学图像的尺寸不一致。图像的纵横比在直接调整它们的同时受到影响。临床上,细胞病理学图像内的细胞的纵横比为医生诊断癌症提供重要信息。因此,很难直接调整大小。然而,许多现有研究直接调整了图像的大小,并获得了高度稳健的分类结果。为了确定合理的解释,我们进行了一系列比较实验。首先,预处理SipakMed数据集的原始数据以获得标准和缩放数据集。然后,将数据集调整为224 x 224像素。最后,22种深度学习模型用于分类标准和缩放数据集。该研究的结果表明,深度学习模型对宫颈细胞病理学图像中细胞的纵横比变化是鲁棒的。此结论也通过Herlev DataSet验证。
translated by 谷歌翻译
基于查询的变压器在许多图像域任务中构建长期注意力方面表现出了巨大的潜力,但是由于点云数据的压倒性大小,在基于激光雷达的3D对象检测中很少考虑。在本文中,我们提出了CenterFormer,这是一个基于中心的变压器网络,用于3D对象检测。 CenterFormer首先使用中心热图在基于标准的Voxel点云编码器之上选择中心候选者。然后,它将中心候选者的功能用作变压器中的查询嵌入。为了进一步从多个帧中汇总功能,我们通过交叉注意设计一种方法来融合功能。最后,添加回归头以预测输出中心功能表示形式上的边界框。我们的设计降低了变压器结构的收敛难度和计算复杂性。结果表明,与无锚对象检测网络的强基线相比,有了显着改善。 CenterFormer在Waymo Open数据集上实现了单个模型的最新性能,验证集的MAPH为73.7%,测试集的MAPH上有75.6%的MAPH,大大优于所有先前发布的CNN和基于变压器的方法。我们的代码可在https://github.com/tusimple/centerformer上公开获取
translated by 谷歌翻译
为了减轻传统推荐系统(RSS)的数据稀疏和冷启动问题,将知识图(KGS)纳入补充辅助信息,最近引起了相当大的关注。然而,简单地整合了基于KG的RS模型的KGS,这不一定是提高推荐性能的保证,甚至可能削弱整体模型能力。这是因为这些KG的构建与历史用户项相互作用的集合无关;因此,这些KG的信息可能并不总是有助于推荐给所有用户。在本文中,我们提出了具有个性化推荐的协作指导的细心知识意识的图表卷积网络(CG-KGR)。 CG-KGR是一种新颖的知识意识推荐模型,通过我们提出的协作指导机制,可以实现高度和相干的KG和用户项目交互的学习。具体而言,CG-KGR首先封装与交互式信息摘要的历史相互作用。然后CG-kgr利用它作为提取kgs的信息的指导,最终提供更精确的个性化推荐。我们在两个推荐任务中对四个现实数据集进行了广泛的实验,即TOP-K推荐和点击率(CTR)预测。实验结果表明,CG-KGR模型在Top-K推荐的召回度量方面,最近最初的最先进模型明显优于1.4-27.0%。
translated by 谷歌翻译
A recent study has shown a phenomenon called neural collapse in that the within-class means of features and the classifier weight vectors converge to the vertices of a simplex equiangular tight frame at the terminal phase of training for classification. In this paper, we explore the corresponding structures of the last-layer feature centers and classifiers in semantic segmentation. Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers. However, such a symmetric structure is beneficial to discrimination for the minor classes. To preserve these advantages, we introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure in imbalanced semantic segmentation. Experimental results show that our method can bring significant improvements on both 2D and 3D semantic segmentation benchmarks. Moreover, our method ranks 1st and sets a new record (+6.8% mIoU) on the ScanNet200 test leaderboard. Code will be available at https://github.com/dvlab-research/Imbalanced-Learning.
translated by 谷歌翻译
Weakly-supervised object localization aims to indicate the category as well as the scope of an object in an image given only the image-level labels. Most of the existing works are based on Class Activation Mapping (CAM) and endeavor to enlarge the discriminative area inside the activation map to perceive the whole object, yet ignore the co-occurrence confounder of the object and context (e.g., fish and water), which makes the model inspection hard to distinguish object boundaries. Besides, the use of CAM also brings a dilemma problem that the classification and localization always suffer from a performance gap and can not reach their highest accuracy simultaneously. In this paper, we propose a casual knowledge distillation method, dubbed KD-CI-CAM, to address these two under-explored issues in one go. More specifically, we tackle the co-occurrence context confounder problem via causal intervention (CI), which explores the causalities among image features, contexts, and categories to eliminate the biased object-context entanglement in the class activation maps. Based on the de-biased object feature, we additionally propose a multi-teacher causal distillation framework to balance the absorption of classification knowledge and localization knowledge during model training. Extensive experiments on several benchmarks demonstrate the effectiveness of KD-CI-CAM in learning clear object boundaries from confounding contexts and addressing the dilemma problem between classification and localization performance.
translated by 谷歌翻译
Witnessing the impressive achievements of pre-training techniques on large-scale data in the field of computer vision and natural language processing, we wonder whether this idea could be adapted in a grab-and-go spirit, and mitigate the sample inefficiency problem for visuomotor driving. Given the highly dynamic and variant nature of the input, the visuomotor driving task inherently lacks view and translation invariance, and the visual input contains massive irrelevant information for decision making, resulting in predominant pre-training approaches from general vision less suitable for the autonomous driving task. To this end, we propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving. We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos. The proposed PPGeo is performed in two stages to support effective self-supervised training. In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input. In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only. As such, the pre-trained visual encoder is equipped with rich driving policy related representations and thereby competent for multiple visuomotor driving tasks. Extensive experiments covering a wide span of challenging scenarios have demonstrated the superiority of our proposed approach, where improvements range from 2% to even over 100% with very limited data. Code and models will be available at https://github.com/OpenDriveLab/PPGeo.
translated by 谷歌翻译
In this work, we focus on instance-level open vocabulary segmentation, intending to expand a segmenter for instance-wise novel categories without mask annotations. We investigate a simple yet effective framework with the help of image captions, focusing on exploiting thousands of object nouns in captions to discover instances of novel classes. Rather than adopting pretrained caption models or using massive caption datasets with complex pipelines, we propose an end-to-end solution from two aspects: caption grounding and caption generation. In particular, we devise a joint Caption Grounding and Generation (CGG) framework based on a Mask Transformer baseline. The framework has a novel grounding loss that performs explicit and implicit multi-modal feature alignments. We further design a lightweight caption generation head to allow for additional caption supervision. We find that grounding and generation complement each other, significantly enhancing the segmentation performance for novel categories. We conduct extensive experiments on the COCO dataset with two settings: Open Vocabulary Instance Segmentation (OVIS) and Open Set Panoptic Segmentation (OSPS). The results demonstrate the superiority of our CGG framework over previous OVIS methods, achieving a large improvement of 6.8% mAP on novel classes without extra caption data. Our method also achieves over 15% PQ improvements for novel classes on the OSPS benchmark under various settings.
translated by 谷歌翻译
Nearest-Neighbor (NN) classification has been proven as a simple and effective approach for few-shot learning. The query data can be classified efficiently by finding the nearest support class based on features extracted by pretrained deep models. However, NN-based methods are sensitive to the data distribution and may produce false prediction if the samples in the support set happen to lie around the distribution boundary of different classes. To solve this issue, we present P3DC-Shot, an improved nearest-neighbor based few-shot classification method empowered by prior-driven data calibration. Inspired by the distribution calibration technique which utilizes the distribution or statistics of the base classes to calibrate the data for few-shot tasks, we propose a novel discrete data calibration operation which is more suitable for NN-based few-shot classification. Specifically, we treat the prototypes representing each base class as priors and calibrate each support data based on its similarity to different base prototypes. Then, we perform NN classification using these discretely calibrated support data. Results from extensive experiments on various datasets show our efficient non-learning based method can outperform or at least comparable to SOTA methods which need additional learning steps.
translated by 谷歌翻译
In this tutorial paper, we look into the evolution and prospect of network architecture and propose a novel conceptual architecture for the 6th generation (6G) networks. The proposed architecture has two key elements, i.e., holistic network virtualization and pervasive artificial intelligence (AI). The holistic network virtualization consists of network slicing and digital twin, from the aspects of service provision and service demand, respectively, to incorporate service-centric and user-centric networking. The pervasive network intelligence integrates AI into future networks from the perspectives of networking for AI and AI for networking, respectively. Building on holistic network virtualization and pervasive network intelligence, the proposed architecture can facilitate three types of interplay, i.e., the interplay between digital twin and network slicing paradigms, between model-driven and data-driven methods for network management, and between virtualization and AI, to maximize the flexibility, scalability, adaptivity, and intelligence for 6G networks. We also identify challenges and open issues related to the proposed architecture. By providing our vision, we aim to inspire further discussions and developments on the potential architecture of 6G.
translated by 谷歌翻译
In this paper, we investigate the joint device activity and data detection in massive machine-type communications (mMTC) with a one-phase non-coherent scheme, where data bits are embedded in the pilot sequences and the base station simultaneously detects active devices and their embedded data bits without explicit channel estimation. Due to the correlated sparsity pattern introduced by the non-coherent transmission scheme, the traditional approximate message passing (AMP) algorithm cannot achieve satisfactory performance. Therefore, we propose a deep learning (DL) modified AMP network (DL-mAMPnet) that enhances the detection performance by effectively exploiting the pilot activity correlation. The DL-mAMPnet is constructed by unfolding the AMP algorithm into a feedforward neural network, which combines the principled mathematical model of the AMP algorithm with the powerful learning capability, thereby benefiting from the advantages of both techniques. Trainable parameters are introduced in the DL-mAMPnet to approximate the correlated sparsity pattern and the large-scale fading coefficient. Moreover, a refinement module is designed to further advance the performance by utilizing the spatial feature caused by the correlated sparsity pattern. Simulation results demonstrate that the proposed DL-mAMPnet can significantly outperform traditional algorithms in terms of the symbol error rate performance.
translated by 谷歌翻译